Estimating the expectation values of spin 1/2 observables with finite resources 
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We examine the problem of estimating the expectation values of two observables when we have a 
finite number of copies of an unknown qubit state. Specifically we examine whether it is better to 
measure each of the observables separately on different copies or to perform a joint measurement of 
the observables on each copy. We find that joint measurements can sometimes provide an advantage 
over separate measurements, but only if we make estimates of an observable based solely on the 
results of measurements of that observable. If we instead use both sets of results to estimate each 
observable then we find that individual measurements will be better. Finally we consider estimating 
the expectation values of three complementary observables for an unknown qubit. 

PACS numbers: 03.67.-a, 03.65.Ta, 03.65.Wj 



I. INTRODUCTION 

One of the most fundamental questions in physics 
is how to best determine unknown properties of a 
physical system. In quantum mechanics this prob- 
lem is complicated by the probabilistic nature of the 
theory. The expectation value of an observable is, 
however, well defined in quantum mechanics. The 
conceptual problems are intensified when we seek to 
obtain information about two observable quantities. 
Classically the only limit to how well we can mea- 
sure two quantities is the skill of the experimenter 
and the available technology. In quantum mechan- 
ics, however, there exist fundamental limits on our 
ability to measure two non-commuting observables. 
It is sometimes thought that it is impossible to si- 
multaneously measure two non-commuting observ- 
ables upon the same system. This, however, is not 
the case; such joint measurements are possible at the 
expense of increased uncertainty in the measurement 
results ^MM- 

One of the earliest investigations into estimating 
arametcrs of a quantum system was by Helstrom 
Parameter estimation is related to state esti- 
mation, for if we determine enough parameters then 
it is possible to completely specify the state of the 
system. There exists a large literature on quantum 
state estimation and quantum state tomography, see 
for example 0, 0, 0, H 0] ■ One important considera- 
tion for state tomography is the available number of 
copies of the undetermined state. When we have a 
very large number of copies then the error in deter- 
mining the state will be small for all sensible proto- 
cols. If, however, there is a limited number of copies 
available then it becomes a priority to find efficient 
means of estimating the state [lfjj . Similar consid- 
erations are important when trying to estimate the 
expectation value of an observable. Methods that 
prove reliable when we have a very large number of 
copies of a state may not be the most efficient ap- 
proach when the number of copies is smaller. 

The problem we shall address in this paper is the 



following. Suppose that we have a finite number 
of copies of some unknown state of a two-level sys- 
tem. Our task is to estimate the expectation values 
of two different observables, while minimizing the 
total error in the estimates. For the case of estimat- 
ing a single observable, it was found that perform- 
ing a collective measurement on all the copies yields 
no advantage over separate measurements upon the 
individual copies of the system [ll|, [l2j]. When we 
consider two observables, the added complexity may 
mean that collective measurement could provide an 
advantage. In this paper, however, we shall not con- 
sider collective measurements but will instead focus 
on simpler measurement schemes that are also more 
amenable to experimental realisation. Our aim is to 
determine whether it is better to measure a single 
observable upon each system or to perform a joint 
measurement of two or three observables upon each 
system. We shall find that if each expectation value 
is estimated using only the results for that particu- 
lar observable, then sometimes a joint measurement 
does provide an advantage. This improvement, how- 
ever, is due to the simple way the expectation values 
have been estimated. By using all the measurement 
results it is possible to improve the performance for 
the case when we measure each observable on a sep- 
arate system. 

The paper will be organized in the following way. 
A brief review of joint measurements will be given 
in section [II] In section IIIII we will examine mak- 
ing separate measurements of the observables, be- 
fore considering joint measurements. We will then 
investigate whether biasing the results of the mea- 
surements can improve the performance, both for 
separate measurements and for joint measurements. 
In section|V]we ask whether Bayes' rule can be used 
to make better use of the measurement results when 
we are performing individual measurements of each 
observable. In section IVT1 we shall briefly investigate 
estimating the x, y and z components of spin for a 
spin-1/2 particle. Finally we discuss our results in 
section IVIII 
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II. JOINT MEASUREMENTS OF SPIN 1/2 
OBSERVABLES 

A joint measurement of two observables is a si- 
multaneous measurement of both observables upon 
the same quantum system. When the observables of 
interest commute then a joint measurement can be 
accomplished with standard von Neumann or pro- 
jective quantum measurements. If the observables 
do not commute, then we must describe our mea- 
surements in terms of the probability operator mea- 
sure (POM) formalism. A detailed description of 
this generalized description of measurements can be 
found in [H, [3. 

A condition that is often used for joint measure- 
ments is the joint unbiasedness condition, which re- 
quires that the expectation values for the jointly 
measured observables are proportional to the ex- 
pectation values of the observables measured sepa- 
rately @, H[. We thus require that (Aj) — a(A) and 
(Bj) = f3(B), where (Aj) and (Bj) are the jointly 
measured expectation values and a and /3 are con- 
stants of proportionality. For examples of the joint 
unbiasedness condition applied to spin- 1/2 systems 
see [H, [H [H El, El. It is possible to relax the 
condition of joint unbiasedness. This leads to a more 
general description of joint measurements (2p| . 

For spin-1/2 observables, A — acr and B = b<r, 
the eigenvalues are ±1. We will choose to assign 
the numerical values of ±1 to the results spin up 
and down also for a joint measurement of A and B. 
This means that \a\ and |/3| will vary between one 
and zero. We wish the joint measurements to rep- 
resent, as accurately as possible, a measurement of 
both A and B. We thus want the marginal prob- 
ability distributions for the joint and the separate 
measurements to be as similar as possible. For spin 
1/2 observables the probabilities are exactly spec- 
ified by the expectation values, hence we wish to 
make the expectation values of the jointly measured 
observables as close as possible to the expectation 
values of the separately measured observables. This 
is achieved by making a and (3 as close to one as is 
possible. The values of a and (3 will be restricted by 
the inequality [lj| 

|aa + /3b| + |aa-/3b| < 2. (1) 

If a joint measurement scheme allows us to saturate 
inequality (fT]), then, for given directions a and b, 
this measurement gives the largest possible value of 
\a\ for a given (3 and vice versa. Any joint mea- 
surement of two components of spin for which in- 
equality (fTJ) is saturated is in this sense an optimal 
joint measurement. For a description of how opti- 
mal joint measurements of spin can be implemented 
see lH, MM, IHi- 



III. ESTIMATION OF TWO OBSERVABLES 

Suppose we have 2N copies of some unknown pure 
qubit state and we wish to learn the expectation val- 
ues of the spin observables A = a - & and B = b • <r, 
where a and b are unit vectors. The most general 
approach would be to perform a collective measure- 
ment on all the 2N copies. When measuring a single 
observable it has been shown, however, that sep- 
arate measurements on each quantum system are 
optimal [ll], G2]- For the case of two observables, 
the added complexity may mean that performing a 
collective measurement could provide a benefit. It 
may, however, be difficult to implement collective 
measurements. Instead we shall investigate theo- 
retically simpler methods, which will also be more 
experimentally amenable. 

The first strategy we shall investigate is to per- 
form measurements of A on N of the copies and of 
B on the remaining N copies. The second strategy 
is to perform a joint measurement of both A and B 
on all 2N copies of the unknown state. As a figure 
of merit we shall use the averaged square error. The 
results of each measurement will be recorded. After 
all the measurements have been performed, we will 
be left with a set of either 2N or AN outcomes, de- 
pending on which of the two measurement strategies 
we were using. The estimates of (A) and (B) will 
depend upon which of the 2 2N or 2 AN possible sets of 
measurement results was obtained. We shall denote 
our estimates of (A) and (B) as a,j and respec- 
tively. The subscripts j and k are used to indicate 
which of the possible sets of outcomes has been used 
to calculate the estimates. The square of the error 
in the estimates of (^4) and (B) can now be defined 
as 

e ° = «i)-a 3 ) 2 and4 = (CB)-M 2 , (2) 

respectively. For different sets of results we will 
obtain different estimates and thus we will obtain 
different errors. Hence we shall average the error 
over all possible sets of outcomes. Let Pj denote the 
probability of obtaining the j th set of outcomes for 
our measurements of the observables upon the 2N 
copies, given that we have a particular state p. Then 
the total averaged square error is given by 

J 3 

+ fj2Pj((B)-h) 2 dp, (3) 

J 3 

where the integral J ...dp indicates an average over 
all pure states. To perform this average we will as- 
sume that all the pure states are evenly distributed 
about the surface of the Bloch sphere. We can intro- 
duce as the angle between the z axis of the Bloch 
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sphere and the Bloch vector of the pure state. We 
also introduce cj) as the angle between the x axis and 
the projection of the Bloch vector of the state in the 
xy plane of the Bloch sphere. The average J ...dp can 

thus be re-expressed as 1/i-rr ...sm(Q)d9d<fr. 

Consider now the first approach where we measure 
only one observable on each copy of the state. The 
choice of estimate is important. For now we shall 
simply take the mean of the measurement results as 
our estimates, i.e. dj = ^ - a^/N and bj = ^ ■ bj/N, 
where dj and 6* are the i th measurement results from 
the j th set of measurement outcomes of A and B 
respectively. The outcomes of a measurement will 
be either +1 or —1. The error in (A) will be given 
by 



e a = J J2m^~~^) 2 dp 



var(a) + ((i}-(5)) 2 dp, (4) 



where var(a) = yV Pj(dj — (a)) 2 is the variance in 
aj and (a) = ^2jPjdj. It can easily be seen that 

(a) = (a 4 ) = (A) and thus this choice of es- 

timate is unbiased. Hence the only term contribut- 
ing to the error in equation ^ is the variance of dj . 
Using the definition of variance it is straightforward 
to show that 
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.(5) 



The results of each measurement are indepen- 
dent, which means that J2j -Pj a T a 7 = ( a '™ a ") = 
(a m )(a n ). It is also useful to note that (a™) 2 = 1. 
From these observations it follows that var(a) = 
(1 — (A) 2 )/N. The error is obtained by averaging 
over all pure states. If we assume that all the pure 
states are equally likely then we find that the aver- 
age of (A) 2 = (a-<r) 2 = 1/3, and thus e a = 2/(3JV). 
The calculation of the error in the estimate of (B) 
proceeds in a similar way and leads to the total error 
being 
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3N' 



(6) 



Equation was derived for the situation where 
each observable is measured on N copies of the un- 
known state. Instead we could measure A on N\ 
copies of the state and measure B on the remaining 
2N — Ni copies. It can be shown that the total error 



for this case would be e T = 2/(3/V 1 )+2/(3[2/V-/Vi]). 
A simple calculation shows that ex has a minimum 
when Ni = N, which leads to the error being given 
by equation ©. 

We will now look at the second strategy, where 
we perform joint measurements of both A and B on 
each of the 2N copies of the unknown state. An ob- 
vious choice of estimates would be dj — J2i a j /(27V) 
and bj = J2 i bj/(2N). From the joint unbiasedness 
condition, however, we know that the expectation 
values of the jointly measured observables would not 
be equal to (A) and (B). Using these estimates 
would therefore lead to an error that would never 
tend to zero. Instead we will use the estimates 
dj = £aj/(a2iV) and bj = £ b\/ '(J32N) . This is 
equivalent to relabeling our measurement outcomes 
ztl/a and ±1/(3 for a measurement of A and B re- 
spectively. By following an argument similar to that 
used to derive equation © it is possible to show that 
the total error takes the form 



(T) 



The symmetry in the problem suggests that we will 
obtain a minimum for ([7]) when a = (3, that is, when 
we measure both observables equally well. This can 
be verified by minimizing ([7]) subject to the con- 
straint imposed by |T]). By averaging over all pure 
state we find that the total error is 



1 



N \ a 



1 



1 



(8) 



If we solve ([T]) for a — (3, we find that a = 
\/l/(l + | sin(r/)|), where r] is the angle between a 
and b. It can be seen that equation © has the 
correct asymptotic behavior as for N — > oo we find 
that £t — > 0. A plot of the error functions, equa- 
tion ([6|) and equation is shown in figure [TJ It 
can be seen that when the angle between a and b 
is small, joint measurements can provide a total er- 
ror that is lower than in equation ([6|). As the angle 
increases it becomes more advantageous to perform 
separate measurements of the observables. It can 
be shown that joint measurements are better than 
separate measurements when | sin(r?)| < 2/3. 

When the angle between a and b is small then in 
a sense the observables A and B can be thought to 
share information. A joint measurement of A and B 
is naturally able to exploit the shared information. 
This is because according to equation (TT|) , when the 
angle between a and b is small, a and (3 can be 
close to 1, corresponding to sharper measurements 
of A and B. This also helps to explain why sepa- 
rate measurements are more effective when the angle 
between a and b gets closer to 7r/2, since at this an- 
gle the observables are complementary. As rj tends 
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FIG. 1: A plot of the error as a function of 77, the angle 
between a and b, and N, which is half the total number 
of copies of the unknown state. The white surface rep- 
resents the error for separate measurements of A and B, 
given by equation JB). The gray surface represents the 
error for joint measurements, given by equation ([HJ . 



to ir/2 the two observables share less information 
meaning that joint measurements lose their advan- 
tage. This suggests that we could try to improve 
the performance of separate measurements by mak- 
ing our estimates using the measurement results of 
both A and B. This idea will be discussed in sec- 
tions |IV] and [V] Before this we will discuss a simpler 
approach to decreasing the error in our estimates. 



IV. BIASED ESTIMATES 

The estimates that were looked at in the previ- 
ous section were unbiased. This means that (5j) = 
Ej Pfaj = (A) and (b 3 ) = £\ P ~b 3 = (B). In fy, 
it was found that adding bias could yield a lower er- 
ror when estimating the expectation value of a single 
observable. This raises the possibility that biased 
estimates could lower the error in our estimates of 
the two expectation values. When adding bias, care 
must be taken in order to obtain the correct asymp- 
totic behavior. As N tends to infinity, the error 
should tend to zero. In order for this to occur the 
bias must tend to zero as N tends to infinity. We 
will now look at adding biasing to both of the mea- 
surement strategies that we discussed in section Hill 

Consider now the first measurement strategy, 
where we measured A and B on separate copies of 
the unknown system. We shall bias the results of 



the measurement of A by rescaling the measurement 
outcomes from ±1 to ±K. This is equivalent to take 
our estimate of (A) to be a — J2i Ka l /N. By fol- 
lowing a modified form of the argument that led to 
equation ([6]) it can be found that the error in A is 
now 



{K - l) 2 2K 2 



3N 



(9) 



It can easily be seen that the value of K that min- 
imizes the error is K — N/(N + 2), which leads to 
e a = 2/(3(JV+2)) and to 5 = E i aV(A r + 2). A 
similar argument can be followed to show that the 
minimum error in (B) is equal to e a , which leads to 



3(N + 2) 



(10) 



It can be seen that equation (fT0|) is always less than 
equation ([6]) for N > 1. Adding bias has thus de- 
creased the error in our estimate. 

To try to understand how adding bias helps we 
shall look at the case when N = 1. In this situation 
our estimate will just be the value that we assign 
for the outcome of a single measurement. When we 
are not biasing the outcomes, then if we get the out- 
come spin up, our estimate would be a = 1. This 
corresponds to guessing that the initial state was 
the eigenstate of a • a, with eigenvalue 1. It is possi- 
ble that our initial state was prepared as this state, 
however, it is more likely that this was not the case. 
It is more probable that (A) is less than one, and 
thus it is preferable to take our estimate a to be less 
than one. If bias is added, then our estimate, based 
on a single spin up result, will be 1/2. Biasing thus 
stops us from overestimating the magnitude of the 
expectation value when N is small. 

Equation (jTUJ) was derived for the situation where 
A and B are each measured on N copies of the 
unknown state. Alternatively we could have mea- 
sured A on Ni copies of the state and measured 
B on the remaining 2N — N\ copies. It can be 
shown that the total error in this case is ct = 
2/[3(Ai + 2)] + 2/(3[2A^ -Nx+2)}. A straightfor- 
ward calculation shows that the value of N\ that 
minimises the total error is Ni = N, and thus the 
minimum error will again be given by equation (JTUJ) . 

We can consider a more general situation where 
we assign K\ and K2 to the measurement results in- 
stead of ±K, where |ATi| need not equal It can 
then be shown that a necessary condition to obtain 
the minimum error is that K\ + Ki = 0, and hence 
we would again obtain equation ()10|) . It is worth 
comparing this with the results of [12j , where it was 
found that the form of the estimates that minimized 
the error was [J^ a 1 + Tr(A)]/(N+d), where d is the 
dimension of the Hilbert space. In our qubit situa- 
tion this would lead us to choose a = J2i a V (N + 2) 
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as our estimates for A and b = b 1 / (N + 2) as 
our estimate for B, which is what we have found by 
allowing bias in our results. 

We shall now investigate whether adding bias to a 
joint measurement will lead to better estimates. It 
should be noted that biased in this sense is not the 
same as when we discussed the joint unbiasedness 
condition in section [TTJ Instead we mean that the 
joint measurements will be such that J^j (A) 
and J2j Pjbj ^ (B). Consider first the results for the 
observable A. Previously we assigned the numerical 
values ±l/a; one way of biasing the results would be 
to instead assign ±K/a to the results. The estimate 
of (A) now becomes dj = J2i Ka l J{a2N). It can be 
shown that the error for this estimate is 



2N 



1 



(l-Kf 



(11) 




The value of K that minimizes the error, e a , is K = 
(2Aa 2 )/(3 — a 2 + 2Na 2 ). A similar argument can 
be followed to find the minimum error in (B). The 
symmetry of the problem suggests that we should 
measure A and B equally well, and thus a = (3. 
This can be verified by minimising the total error 
subject to the inequality (Q]). The error in (B) will 
thus equal e a , which leads to the following expression 
for the total error 



2(3 



3(3 - a 2 + 2Na 2 



(12) 



We should note that although we are using biased 
estimates, the expectation value, (Aj), is still pro- 
portional to (A). The joint measurement will thus 
still satisfy the joint unbiasedness condition. 

A more general way of biasing the results is to as- 
sign Ki/a and K 2 /a to the results of the measure- 
ment of A. It can then be shown that a necessary 
condition to minimize the error is that K\ = —Ki- 
This leads to the error in (A) being equal to equation 
(fTTj) and thus the total error will again be given by 
equation (fT2"|) . By comparing equation (fT2"|) to equa- 
tion pop , we find that biased joint measurements 
provide a smaller error than biased separate mea- 
surements for values of T] up to sin _1 (2/3) « 0.73 
radians. This is shown in figure O 

When r\ is small then A and B share information, 
which is exploited by joint measurements. This is 
the reason why the error for joint measurements, 
equation (fT2"]) . is less than the error for separate 
measurement, equation (|10p . when r\ is small. If 
we wished to improve the performance of separate 
measurements then we will need to make the esti- 
mates of (A) and (B) based on all the measurement 
results for both observables. We will now investigate 
a simple way of achieving this. 



FIG. 2: A plot of the error as a function of 7/, the angle 
between a and b, and N which is half the total number 
of copies of the unknown state. The white surface repre- 
sents the error for biased separate measurements, given 
by equation (|10[l . The gray surface represents the error 
for biased joint measurements, given by equation (|12|l . 



Suppose we perform separate measurements of the 
two observables and obtain the set of measurement 



outcomes {a™ 
(A) to be 



K-}. We shall take our estimate of 



T7> Ka T 



N 2^ j 



(13) 



where A is a constant that weights the results for B. 
The constant K is included to bias the results of A. 
This additional biasing may be needed as we should 
expect that there will be situations where A = 0, for 
instance when a • b = 0. It can be shown that 



var(a) + ((A) - (a)) 2 dp 



2(K 2 + A 2 ) {1-K) 2 + X 2 

W + 3 
2A(1 - K)& h 



(14) 



The values of K and A that minimize the error are 
given by K = N[2 + N - N(& ■ b) 2 ]/[(iV + 2) 2 - 
A^ 2 (a-b) 2 ] and A = 2A^a • b/[(2 + iV) 2 - A^ 2 (a • b) 2 ]. 
The estimate of (B) will take a form analogous to 
equation (|13|) . This means that the minimum error 
in the estimate of (B) will equal the minimum of e a ■ 
The total error will thus be 



4[iV(a ■ b) 2 - N - 2] 
3[Af 2 (a-b) 2 - (N + 2) 2 ]' 



(15) 



G 



This new value of error is always less than the error 
found using biased joint measurements in equation 
(fT2"|) and is also less than equation (jTTJJ) . A more 
sophisticated means of estimating the expectation 
values using the results of all the measurements, will 
be discussed in the next section. 



V. BAYESIAN INFERENCE FOR TWO 
OBSERVABLES 

The Bayesian view of probability is that it encodes 
what we know about a system. For example if some- 
one flips a coin, we might say that there is a probabil- 
ity of 1/2 that the outcome was heads. If, however, 
we are told that the coin is unevenly weighted in 
favour of tails, then knowing this changes the prob- 
ability that we assign to the outcomes. The Bayesian 
view leads to probabilities which are constantly be- 
ing changed as we learn more about the system of 
interest. The means of updating our probability dis- 
tributions is provided by Bayes' rule. This states 
that if we initially assign a probability p(x) that an 
event x will occur, and then find out y, then the new 
probability we assign to x occurring given y, p(x\y), 
is given by 



p(x\y) 



p{y\x)p{x) p{y\x)p{x) 



P(y) Jp(x,y)dx 
p(y\x)p(x) 



J p(y\x)p(x)dx' 



(16) 



The Bayesian approach is frequently employed for 
decision and estimation problems due to the simple 
way it allows us to include new information. This 
approach will now be used to help us in the problem 
of estimating the expectation values of two observ- 
ables for a qubit. 

We shall consider performing separate measure- 
ments of the observables A and B, that is, A will 
be measured on one copy and B will be measured 
on a different copy. The results of these measure- 
ments will be used to update the probability distri- 
butions P((A)) and P((B)), which will in turn be 
used to estimate the expectation values. Initially 
only the results of a measurement of A will be used 
to update P((A)) and likewise for P((B)). We will 
find that this approach leads to the same total error, 
equation (110p . as we found for biased measurements. 
After this we use all the results to update the prob- 
ability distributions P((A)) and P{(B)). It will be 
shown that this approach allows us to estimate the 
expectation values with an error that is lower than 
equation (JTUf . 

In the following calculations we will need to per- 
form an average over all possible pure states, where 
we assume that the pure states are distributed 



evenly over the surface of the Bloch sphere. The av- 
erage over pure state can be expressed in spherical 
polar coordinates as it was in section UTT1 We shall 
take a to point along the z direction of the Bloch 
sphere, this means that (A) = (a -a) = cos(0). It 
should be noted that the quantities that we will be 
averaging will be functions of (A) , and thus will not 
depend on the angle (p. This means that the average 
over all pure states can be written as 



.dcos(9) 



.d(A). (17) 



As we are initially ignorant of the value of (A) , we 
shall take the initial probability distribution of (A) 
to be P((A)) = 1/2 for all values of (A). We then 
update this probability after performing N measure- 
ments of the spin along a. As these measurements 
are performed on separate but identically prepared 
systems, updating the probability distribution af- 
ter all N measurements is equivalent to updating 
the probability distribution after each measurement. 
From Bayes rule, equation (|16p . we find that our up- 
dated probability is 



P((A)\{a)}) 



P({a*}\(A))P((A)) 
fP({ai}\(A))P((A))d(Ay 



(18) 



where {dj} denotes the j set of N results for the 
measurement along a. The updated probability dis- 
tribution can then be used to estimate dj according 
to 



(A)P((A)\{a)})d(A). 



(19) 



As before the value of the estimate dj will depend 
on the measurement results {a*}. We can estimate 

(B) by a completely analogous process, where we 
have expressions similar to equations (|T8]) and (fT9| . 
We proceed by calculating the error that we would 
obtain for an estimate of (A) for some fixed value of 
(A) and then we average this result over all possible 
values of (A) . Thus the total error will be given by 



\ f £P,«i)-a,) 2 d<i} 

\ f J2 P k((B}-h) 2 d{B), (20) 

J ~ 1 Shi \ 



where Pj is the probability of obtaining the j set 
of results for the N measurements of A and Pk is 
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the probability of obtaining the k th set of results for 
the measurement of B. From symmetry the error in 
the estimate of (B) will be equal to the error in the 
estimate of (A). 

The results of each measurement of A are indepen- 
dent which means that P({a l j}\(A)) will simply be 
the product of the N single measurement probabili- 
ties P(dj\(A)), where a J refers to the outcome of the 

i th measurement of A. The probability P(a*-|p) de- 
pends only on the component of the Bloch vector of 
p along the direction a, which means that P(dj\p) = 

P(a % j | (A) ) . For the sake of brevity we shall introduce 

the notation I n . {a * } = j\ (A) n P({a)}\(A))d(A), so 

that dj — Ix^aiy/Io^a*}- This notation allows us to 

express the error in (A) in the compact form 



{<*)} 



S 'j^ I l,{a}} + ( a j)^9 J 0,{a;-} 
{<} 



1 



y- ( J l,{aj})' 



(21) 



Let r denote the number of times spin up was ob- 
tained for a set of outcomes {a*}. Then we can 
replace the sum over all possible sets of outcomes in 
equation (|21[) with a sum over r. This will lead to a 
factor N\/[r\(N — r)l] appearing within the summa- 
tion to take account of the fact that some of the sets 
{a]} are permutations of each other and thus differ- 
ent sets will have the same value of r. To calculate 
the error the integrals Io, r and I\_ r will need to be 
evaluated. 

By changing the variable of integration to u = 
1/2(1 + (A)), we obtain J , r = 2 ft u r (l - u) N - r du, 
which is of the form of a beta function 2B(r+ 1, N — 
r + 1) = 2r\(N - r)\/(N + 1)!. The same change of 
variable allows us to express I\^ r as a linear combi- 
nation of beta functions, I\_ r = 4B(r + 2, TV — r) — 
2B(r + 1, N - r + 1). It can be shown that 



1 1 - 

---Y 



r\(N 



lo.r 



N 



1 (Ar 2 - 4Nr + N 2 ) 

3 (N + l)(iV + 2) 2 

2 



3(N + 2) 



(22) 



The calculation for the error in (B), e^, is completely 
analogous. Hence the total error will be the same as 
was found in equation (|10[) for the biased estimates. 
The Bayesian analysis therefore naturally takes ac- 
count of biasedness. 

As was noted in sections IlIII and ITVl a joint mea- 
surement can sometimes allow estimation of two ob- 



servables with an error lower than equation (|10|) . It 
is, however, possible to improve upon equation ([101) 
while still performing separate measurements of the 
two observables. To achieve this we must change the 
way we update the probabilities P((A)) and P((B)). 
Instead of using only the results of measurements of 
A to update P((A)) we also include the results of 
the measurements of B. This means that equation 
HU) changes to 



P((A)\{af,bt}) 



P({a™,b n k }\(A))P((A)) 



JP({a™,b" k }\(A))P((A))d(Ay 

(23) 

where {a J 1 , b k } denotes the sets of outcomes {a™} 
and {b r k 1 }, for the measurements of A and B. 
Similarly the estimate of A changes to dj — 
j (A)P({A)\{af,b^})d{A). The measurements of A 

and B are performed upon separate copies of the 
system, thus 



P({af,b n k }\(A)) = P({af}\(A))P({b n k }\(A)) 



N 



U P K\(A)) 



N 



n p (w» 



• (24) 



As before P{af\{A)) = P(af\p). It will, however, 
generally not be true that P(b£\{A)) = P(b%\p). 
Suppose that (A) assumes the value x. We can 
then relate P(b k \(A) = x) to P(b k \p) by averaging 
P{b k \p) over all the pure states p for which Tr(pA) = 
x. Such states will be confined to a circle on the sur- 
face of the Bloch sphere. A simple calculation shows 
that P(b n k \(A)) =JP{bl\p)dp - l/2±l/2(a- b)(i), 
where the + occurs when b k — 1 and the — occurs 
when b k = — 1. It should also be noted that the 
integration is performed over the set of pure states, 
p, where Tr(pA) = x. For the sake of brevity we 
introduce the notation 



z n,{a"\b£} 



{A) n P{{a?,%}\(A))d(A), (25) 



as before it will at times be useful to change this 
notation to 7 n>rs where r and s are the number of 
times a and b were found to be spin up for a given 
set of outcomes {a™,b k }. In this notation we may 
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express the error in (A) as 



J2 P({aT,b n k }\(A))(^) 2 d{A) 

1 1 >-> (h,{af,b^}) 2 
3 2 r £fm ^0,{a m .f,"} 

3 2 ^ r!(AT-r)! s!(AT- s)! J , rs 



(26) 



The derivation of follows a completely analo- 
gous procedure, where now we seek P((B)|{a™, &$?})• 
From symmetry it follows that e b — £a giving 
ex = 2e a . The evaluation of the expression (f2"6")l 
is straightforward for a given value of N. Obtain- 
ing an expression for a general N is however diffi- 
cult. Two simple cases exist for which equation ([26]) 
may be readily evaluated. The first of these is when 
a • b = meaning that P(b) \(A)) = 1/2 for all i, and 
thus er = 4/[3(AT + 2)]. It should not be surprising 
that we obtain the same error as was found before 
in equation (jTUJ) . This is because the observables 
are complementary and thus we would not expect 
knowledge of the results of measurements of B to 
help with estimates of (A) . The second case is when 
a • b = 1, that is, when a = b. It can be shown that 
in this instance ex = 2/[3(AT+ 1)], which is less than 
equation (fT0|) . The observables A and B are now 
the same observable and if we wish to estimate a 
single observable when we have 2N copies of an un- 
known pure state, then from [l2j we would expect 
that e = l/[3(iV+l)], which is half of e T = 2e a = 2e b . 
The results of equation (|2"6"|) are thus consistent with 
[l2j], for A — B. Intuitively we would expect that 
these two cases should act as upper and lower limits 
for the total error obtained by this approach. This 
intuition is verified by numerical investigations of 
€t, plots of which are shown in figure [3J It can also 
be seen that the error that is obtained is always less 
than the error in equation (|12[) . obtained using joint 
measurements. Thus using Bayes' rule allows us to 
process the measurement results so as to improve 
the error such that it is no worse and often better 
than the total error obtained using the joint mea- 
surements described in sections IIIII and IIV1 

The Bayesian approach, that we have discussed, 
leads to an improvement in the error because the 
results of both observables are used to make our es- 
timates. It is thus similar to the approach that was 
discussed at the end of section HVl which lead to the 
error given by equation (| 1 5 j) . We shall now compare 
these two approaches for a • b = and a • b = 1. 
Examining these two situations for equation (| 1 5[) 



separate measurements, for angle of 0.1 radians+ 

joint measurements, for angle of 0. 1 radians 

separate measurements, for angle of 1 radian a 
'joint measurements, for angle of 1 radian - - - 




5 10 N 15 20 25 



30 



FIG. 3: A plot comparing the error for joint and separate 
measurements, as a function of N for different rj. 



we find that e T = 4/ [3 (AT + 2)] for a • b = and 
= 2/[3(Af + 1)] for a-b = 1, which is the same as 
was found for the second Bayesian approach, which 
led to equation (fJU). For other values of a • b, and 
consequently other values of 77, it is found that equa- 
tion (jT5j) is always greater than the total error ob- 
tained numerically for the Bayesian analysis. 



VI. ESTIMATING THE EXPECTATION 
VALUES OF THREE COMPLEMENTARY 
SPIN OBSERVABLES 

Thus far we have considered estimating two ob- 
servables. We shall now investigate measuring three 
different observables of a qubit. The observables 
that we shall consider are complementary, for ex- 
ample the x, y and z components of spin for a spin 
1/2 particle. As before we shall adopt two different 
measuring strategies, either measuring each of the 
observables separately or performing a joint mea- 
surement of the three observables upon each sys- 
tem. We assume that we are given 3A~ copies of an 
unknown state and we again take as our figure of 
merit the total error, which is averaged over all the 
measurement results and all the possible pure states. 
Thus 

3 

+((* z ) - zj) 2 ]dfi, (27) 

where Xj, yj and Zj are the estimates of (o" x ), (fry) 
and (fr z ) respectively. 

We begin by looking at measuring each observable 
separately. If we use the unbiased estimates of sec- 
tion [Hi] then we find that £t = 2/N. As before we 
may improve upon this by using biased estimates, 
which allow us to estimate the observables with an 
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error of ct = 2/ (AT + 2). Using Bayesian meth- 
ods will not allow us to improve upon this error as 
the three observables are all complementary to each 
other. 

Our second approach is to perform a joint mea- 
surement that gives us information about each of 
the three observables. Again we apply the condition 
of joint unbiasedness so that we have {& x j) = a{a x ), 
(o-yj) = P(v y ) and (a zJ ) = j(a z ), where (a x j), 
(& y j) and (a z j) are the jointly measured expectation 
values. In [151 ] a condition similar to the inequality 
([1]) is discussed for a joint measurement with sharp- 
nesses a, (3 and 7, which states that a 2 +/3 2 +7 2 < 1. 
One important difference between this inequality 
and the inequality |1| is that in general this inequal- 
ity is only a sufficient condition for the existence of 
a joint measurement of three different spin compo- 
nents. It is thus possible for a joint measurement of 
three spin components to exist which does not sat- 
isfy this inequality. A measurement of ov^ a v and a z 
will, however, not violate the inequality [221 ]. A short 
proof of this fact is given in the appendix. With this 
in mind we shall consider joint measurements that 
satisfy a 2 + 1 + j 2 — 1 to be an optimal joint mea- 
surement. We shall rescale our measurement results 
from ±1 to ±K/a, ±K/(3 and ±K/^f for x, y and z 
respectively, where if is a constant. As we are com- 
pletely ignorant of the state, symmetry would sug- 
gest that we should measure each observable equally 
well and thus a — (3 — j — 1/ We find that the 
total error is given by 



3 - a 2 + 2a 2 N A + N' y ' 

It can be seen that if we perform joint measurements 
then the total error is greater than the error obtained 
for the biased separate measurements of the observ- 
ables. 

One interesting point to note is that the joint mea- 
surement of a x , by and a z is informationally com- 
plete (23[. In other words the joint probability distri- 
bution that we obtain will be different for each state. 
It is also worth noting that the joint measurement is 
not a symmetric informationally complete measure- 
ment j24(. The reason for this is that some of the 
POM elements, that describe the joint measurement, 
are orthogonal to each other while others are not. In 
conventional state tomography we reconstruct the 
state by performing several different measurements 
repeatedly, but informationally complete measure- 
ments allow us to reconstruct the state by repeat- 
edly performing only one type of measurement. For 
state estimation a useful figure of merit is the trace 
distance, which is defined as D(pi, p-i) = Tr\p\ — p>2\, 

where |0| = O. It is easy to show that for two 
qubits with Bloch vectors Ci and C2 the trace dis- 
tance equals D{p\,p2) — \c\ — C2I. If we are given a 



state p and we estimate it to be p es t then we could 
choose our method such that we minimize 

Dip, pest) = [((o- x ) - x) 2 + ({& y ) - y) 2 

+ {{a z )-~z) 2 Y' 2 , (29) 

where x, y and z are respectively the x, y and z 
components of the Bloch vector of p es t- The mea- 
surements that we perform to estimate the state will 
have different outcomes, which will lead us to obtain 
a different estimate of the state. Therefore the dis- 
tance is averaged over all possible measurement re- 
sults. The state which is being estimated is unknown 
to us, thus we should perform a second average over 
all possible input states. This figure of merit for 
state estimation is connected in a simple way to the 
averaged total error by 

^T= Jj2^ D (P^e S t) 2 dp, (30) 

3 

where p 3 est denotes our estimate of p given we ob- 
tained the j th set of measurement outcomes. 

In view of our earlier results we find that although 
informationally complete joint measurements allow 
us to estimate a state by performing the same mea- 
surement repeatedly, they are not as efficient as per- 
forming biased separate non-informationally com- 
plete measurements of the spin along three orthog- 
onal directions. While the informationally complete 
joint measurement is not as efficient as simply mea- 
suring along three orthogonal directions, this does 
not mean that informationally complete measure- 
ments are not useful for state estimation. For an 
example of informationally complete POMs applied 
to state estimation see [251 ], which investigates us- 
ing four element informationally complete POMs to 
perform state tomography on a qubit. 

If our methods for estimating expectation values 
are used for state estimation, this may lead to an 
estimated state with a Bloch vector with a length 
that is greater than one. This is a consequence of 
the figure of merit we have used. In state estima- 
tion, one may choose to overcome this problem by 
renormalising the obtained Bloch vector so that it 
has unit length. An alternative approach is to use 
an estimation procedure, in which it is impossible 
to produce an estimated state with a Bloch vector 
that has length greater than one. This approach is 
adopted in [25j ]. 



VII. CONCLUSIONS 

We have investigated estimating the expectation 
values of two different observables for a qubit system 
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that is prepared in an unknown pure state. Two dif- 
ferent approaches were considered, performing sepa- 
rate measurements of each of the observables or per- 
forming a joint measurement of the observables upon 
each single system. It was found that measuring the 
observables separately leads to a smaller total error 
in the estimates, if we use either the Bayesian meth- 
ods or use estimates of the form given by equation 
(fT3|) . If instead, only the measurement results relat- 
ing to a particular observable are used for estimating 
the expectation value of that observable, then joint 
measurements may be better. This is because joint 
measurements naturally take account of information 
which is shared between the observables when these 
are not complementary. 

The problem with the estimation procedures de- 
scribed in sections Hill and HVl was that estimates of 
each observable were made using only the informa- 
tion obtained from measurements of that observable. 
When the angle between a and b is small then the 
observables share information. This is exploited by 
joint measurements, giving a lower error. The per- 
formance of separate measurements can be improved 
if we make an estimate of an expectation value based 
upon the results for both observables. By using this 
approach it was found that separate measurements 
lead to a smaller error than joint measurements, for 
all values of the angle between a and b. 

We finally examined estimating three complemen- 
tary spin 1/2 observables of a qubit. It was found 
that performing separate measurements of all of the 
three observables leads to a smaller error than per- 
forming a joint measurement of all three observ- 
ables. The situation of measuring three comple- 
mentary spin components of a qubit is important 
for state tomography, as the results of the measure- 
ments enables us to determine the state of the qubit. 
In view of our findings it can be seen that if we wish 
to perform state tomography on a qubit, when we 
have a limited number of copies, then performing 
separate measurements of the x, y and z spin com- 
ponents is more efficient that performing an optimal 
joint measurement of the three observables. 
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APPENDIX A 

Suppose we wish to perform a joint measurement 
of fr x , G y and a z . We shall apply the joint unbi- 
asedness condition so that we have (a x j) = a(a x ), 
(v y j) = 0{&y) and (a z j) = j(a z ). In order to 
perform this joint measurement there must exist an 
eight-element POM, {Il^j & so that we can assign 
outcomes to each of the three spin observables. We 
define the three marginal POMs, which describe the 
probability of obtaining a given outcome for each 
of the three observables, to be ft^f = Ylj k ±*j k' 

II:"' = V,li;r,. and 117 = V,ir- • If we 
assign the results ±1 to the outcomes of each spin 
measurement, then for the joint unbiasedness condi- 
tion to be satisfied the marginal POMs must have 
the form [l~5| 

nr = \{i±io z ). (Ai) 

Consider now the situation when we have two 
qubits prepared in a singlet state 

IV>>12 = ^(l+-}l2-|-+)l 2 ), (A2) 

where a x |±) = ±|±). Suppose the joint measure- 
ment n^ 2 ,, is performed on the first qubit and we 
obtain the outcome + + + , that is all three spin com- 
ponents are found to be spin up. Given this outcome 
the second qubit will now be prepared in the state 
Pi- 

If an unsharp measurement of a x were performed 
on the first qubit of \ip) and a projective measure- 
ment of a x were performed on the second qubit, 
then the probability distribution for the outcomes 
would be Pi 2 (±,±) = 1/4(1 - a) and P 12 (±,=f) = 
1/4(1 + a). The conditional probability for the re- 
sults of the measurement on the second qubit, given 
that we obtained 'spin up' for the unsharp measure- 
ment on the first qubit, are found to be 

P 21 (±|+) = ~t|. (A3) 

A joint measurement of the x, y and z spin compo- 
nents will implement three unsharp spin measure- 
ments described by the marginal POMs (|A1[) . If 
we obtain the outcome + + + and thus prepare the 
second qubit in the state /5 2 = 1/2(1 + c • <r), then 
the probability of obtaining the the result ± for a 
projective measurement of a x will be (±|/3 2 |±) = 
(1 ± c x ) /2. If the joint measurement does indeed im- 
plement an unsharp measurement of a x , with sharp- 
ness a, then P 2 i(±|+) = (1 T a)/2 = (±|/5 2 |±). 
From this we obtain that c x = —a. Using similar 
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arguments we can find that c y — —(3, and c z = —7. 
A necessary condition for pi to be a valid density 
operator is that |c| 2 < 1, which implies that 

a 2 +/3 2 + 7 2 <l. (A4) 



We have thus established that (|A4[) is a necessary 
condition for us to be able to perform a joint mea- 
surement of a x , (jy and a z , with sharpnesses of a, f3 
and 7, respectively. 
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